T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification
نویسندگان
چکیده
Abstract Cross-lingual text classification leverages classifiers trained in a high-resource language to perform other languages with no or minimal fine-tuning (zero/ few-shots cross-lingual transfer). Nowadays, are typically built on large-scale, multilingual models (LMs) pretrained variety of interest. However, the performance these varies significantly across and tasks, suggesting that superposition modelling tasks is not always effective. For this reason, paper we propose revisiting classic “translate-and-test” pipeline neatly separate translation stages. The proposed approach couples 1) neural machine translator translating from targeted language, 2) classifier but generates “soft” translations permit end-to-end backpropagation during pipeline. Extensive experiments have been carried out over three datasets (XNLI, MLDoc, MultiEURLEX), results showing has improved competitive baseline.
منابع مشابه
Semi-Supervised Representation Learning for Cross-Lingual Text Classification
Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification ...
متن کاملCross-lingual Distillation for Text Classification
Cross-lingual text classification(CLTC) is the task of classifying documents written in different languages into the same taxonomy of categories. This paper presents a novel approach to CLTC that builds on model distillation, which adapts and extends a framework originally proposed for model compression. Using soft probabilistic predictions for the documents in a label-rich language as the (ind...
متن کاملTransfer learning for text classification
Linear text classification algorithms work by computing an inner product between a test document vector and a parameter vector. In many such algorithms, including naive Bayes and most TFIDF variants, the parameters are determined by some simple, closed-form, function of training set statistics; we call this mapping mapping from statistics to parameters, the parameter function. Much research in ...
متن کاملActive Learning for Cross-Lingual Sentiment Classification
Cross-lingual sentiment classification aims to predict the sentiment orientation of a text in a language (named as the target language) with the help of the resources from another language (named as the source language). However, current cross-lingual performance is normally far away from satisfaction due to the huge difference in linguistic expression and social culture. In this paper, we sugg...
متن کاملSemi-Supervised Matrix Completion for Cross-Lingual Text Classification
Cross-lingual text classification is the task of assigning labels to observed documents in a label-scarce target language domain by using a prediction model trained with labeled documents from a label-rich source language domain. Cross-lingual text classification is popularly studied in natural language processing area to reduce the expensive manual annotation effort required in the target lang...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2023
ISSN: ['2307-387X']
DOI: https://doi.org/10.1162/tacl_a_00593